-
Notifications
You must be signed in to change notification settings - Fork 494
UCT/CUDA: Add support for flush_ep #10991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
WalkthroughAdds a CUDA-specific endpoint flush API that enqueues per-stream flush events tied to a shared flush descriptor, introduces flush event types and a flush descriptor pool, updates progress to treat flush events specially, and adds rollback and cleanup on allocation/enqueue failures. Changes
Sequence Diagram(s)sequenceDiagram
participant Caller
participant EP as Endpoint
participant IFACE as CUDA_Base_IFACE
participant Stream as GPU_Stream
participant Progress as Progress_Loop
Caller->>EP: ep_flush(comp)
activate EP
EP->>IFACE: uct_cuda_base_ep_flush()
activate IFACE
IFACE->>IFACE: alloc flush_desc (stream_counter = N)
loop per active stream (N)
IFACE->>Stream: enqueue per-stream flush event (flush_stream_desc)
end
EP-->>Caller: return (pending)
deactivate EP
deactivate IFACE
par Stream completions & progress polling
Stream->>IFACE: stream event triggers callback
activate IFACE
IFACE->>IFACE: uct_cuda_base_stream_flushed_cb() — decrement counter
alt counter == 0
IFACE->>Caller: complete user completion, free flush_desc
end
deactivate IFACE
Progress->>Progress: poll events (skip flush events)
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~25 minutes
Poem
Pre-merge checks and finishing touches✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
src/uct/cuda/base/cuda_iface.c(4 hunks)
🧰 Additional context used
🧬 Code graph analysis (1)
src/uct/cuda/base/cuda_iface.c (2)
src/ucs/debug/memtrack.c (2)
ucs_free(368-372)ucs_malloc(328-334)src/ucs/datastruct/mpool.c (7)
ucs_mpool_get(215-218)ucs_mpool_put(220-223)ucs_mpool_chunk_malloc(326-330)ucs_mpool_chunk_free(332-335)ucs_mpool_params_reset(60-73)ucs_mpool_init(81-147)ucs_mpool_cleanup(149-198)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (9)
- GitHub Check: UCX PR (Static_check Static checks)
- GitHub Check: UCX PR (Codestyle AUTHORS file update check)
- GitHub Check: UCX PR (Codestyle ctags check)
- GitHub Check: UCX PR (Codestyle codespell check)
- GitHub Check: UCX PR (Codestyle format code)
- GitHub Check: UCX PR (Codestyle commit title)
- GitHub Check: UCX release DRP (Prepare CheckRelease)
- GitHub Check: UCX release (Prepare CheckRelease)
- GitHub Check: UCX snapshot (Prepare Check)
🔇 Additional comments (7)
src/uct/cuda/base/cuda_iface.c (7)
260-265: LGTM: Clean flush event detection.The function pointer comparison is an effective way to identify flush events. The dual check (non-NULL comp and matching function) ensures correctness.
275-290: LGTM: Correct flush event handling in progress path.The updated logic properly distinguishes flush events from regular events:
- Flush events are extracted immediately (serving as queue markers)
- Regular events are queried for completion before extraction
- Flush events skip
complete_eventinvocation (line 285-287) since they represent flush markers rather than actual CUDA operationsThe flush-as-marker design is sound: when a flush event reaches the front of the queue, all preceding operations are guaranteed complete.
451-457: LGTM: Appropriate mpool configuration for flush descriptors.Unlike regular event descriptors that require CUDA event initialization, flush descriptors are simple structures that use queue ordering for synchronization. NULL init/cleanup callbacks are correct here.
465-486: LGTM: Proper flush mpool initialization.The constructor correctly:
- Initializes mpool parameters with appropriate element size
- Checks initialization status and returns errors properly
- Follows the same pattern as event descriptor mpool initialization elsewhere in the codebase
502-502: LGTM: Proper flush mpool cleanup with leak checking.The destructor correctly cleans up the flush mpool with leak checking enabled, matching the cleanup pattern for other mpools in the codebase.
209-211: No issues found after verification.The NULL completion handling is correct and follows expected UCT semantics. When
comp == NULL, the function returnsUCS_INPROGRESSif there are active streams, indicating work is pending without a callback request—consistent with other async transports like UD. The mechanism is well-documented in the code comments explaining the flush event queue mechanism.
242-257: Rollback logic appears sound — no issues found.The verification confirms the implementation is correct:
ucs_queue_iter_elemmacro correctly uses container_of to retrieve queue descriptors from the iterator over the embeddedqueuefield- Tail element assumption is valid — the rollback retrieves exactly
stream_counterelements (matching successful pushes) from each queue; since push appends to tail and retrieval grabs tail, the most-recently-pushed item is correctly identified for rollback- Type cast is safe —
superis the first field ofuct_cuda_flush_stream_desc_t, so castinguct_cuda_event_desc_t*back to the parent type is validThe code relies on single-threaded enqueue semantics with no concurrent progress drains during the error path, which appears consistent with the endpoint context.
What?
Add support for flush_ep
Why?
Required for cuda_ipc to be used in gtests
Summary by CodeRabbit